Search CORE

3 research outputs found

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

Author: A Schembri
B Woll
CG Fisher
H Cooper
J Forster
J Hu
O Koller
O Koller
OA Crasborn
R Bank
R Sutton-Spence
S Tamura
T Pfister
T Stafylakis
T-Y Lin
U Agris
Valli C., University, G.
W Liu
Publication venue
Publication date: 01/01/2020
Field of study

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 hours of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks - we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.Comment: Appears in: European Conference on Computer Vision 2020 (ECCV 2020). 28 page

arXiv.org e-Print Archive

Crossref

UCL Discovery

Oxford University Research Archive

Automatic Segmentation of Sign Language into Subtitle-Units

Author: J Fenlon
M Sundermeyer
O Veksler
OA Crasborn
R De Beaugrande
SK Ko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2020
Field of study

International audienceWe present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Like Hand, Like Mouth: On the Role of Gesture-Linked Mouth Actions in the Evolution of Language

Author: A Kendon
A Tramacere
AM Schel
B Woll
B Woll
B Woll
C Crockford
C Crockford
C Darwin
CA Padden
CS Peirce
DE Blasi
DP Vinson
E Gerardin
E Irvine
E Kohler
F De Waal
G Buccino
G Rizzolatti
J Call
J Decety
J Grèzes
JS Long
K Sterelny
L Ferrara
M Donald
M Gentilucci
M Gentilucci
M Gentilucci
M Gentilucci
M Gentilucci
M Graziano
M MacSweeney
M Tomasello
M Tomasello
MA Arbib
N Fay
N Fay
N Frishberg
OA Crasborn
P Bernardis
R Burling
RJ Planer
RW Wrangham
S Garrod
S Garrod
S Tanaka
SO Hwang
V Gallese
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref